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Abstract 

In this paper, we propose a two-timescale delay-optimal base station Discontinuous Transmis- 
sion (BS-DTX) control and user scheduling for downlink coordinated MIMO systems with energy 
harvesting capability. To reduce the complexity and signaling overhead in practical systems, the 
BS-DTX control is adaptive to both the energy state information (ESI) and the data queue state 
information (QSI) over a longer timescale. The user scheduling is adaptive to the ESI, the QSI 
and the channel state information (CSI) over a shorter timescale. We show that the two-timescale 
delay-optimal control problem can be modeled as an infinite horizon average cost Partially Observed 
Markov Decision Problem (POMDP), which is well-known to be a difficult problem in general. By 
using sample-path analysis and exploiting specific problem structure, we first obtain some structural 
results on the optimal control policy and derive an equivalent Bellman equation with reduced state 
space. To reduce the complexity and facilitate distributed implementation, we obtain a delay-aware 
distributed solution with the BS-DTX control at the BS controller (BSC) and the user scheduling 
at each cluster manager (CM) using approximate dynamic programming and distributed stochastic 
learning. We show that the proposed distributed two-timescale algorithm converges almost surely. 
Furthermore, using queueing theory, stochastic geometry and optimization techniques, we derive 
sufficient conditions for the data queues to be stable in the coordinated MIMO network and discuss 
various design insights. Finally, we compare the proposed algorithm with various baseline schemes 
and show that significant delay performance gain can be achieved. 

Index Terms 

delay-aware, base station discontinuous transmission control (BS-DTX), interference network, re- 
newable energy, energy harvesting system, distributed stochastic learning, queueing theory, stochastic 
geometry. 
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I. Introduction 

Inter-cell interference is a critical performance bottleneck in cellular networks. The interference 
mitigation techniques can be roughly classified into two types, namely coordinated MIMO tech- 
niques and cooperative MIMO techniques, according to the required backhaul consumption [|T|. For 
coordinated MIMO techniques, only the channel state information (CSI) is shared among MIMO 
base stations (BSs) through backhaul for the coordinated beamforming design at each BS to combat 
interference JJl. On the other hand, for cooperative MIMO techniques, both the CSI and the payload 
data are shared among MIMO BSs through backhaul for joint precoder designs at all the BSs to 
combat interference [3|. Since CSI sharing is performed for each transmission frame, while data 
sharing is operated for each data symbol, coordinated MIMO consumes much less backhaul capacity 
than cooperative MIMO at the expense of performance (e.g., degrees of freedom). 

Due to the limited degrees of freedom and the limited backhaul capacity at each BS, global 
cooperation or coordination of all the BSs in the network is not possible and the BSs are organized 
into disjoint clusters ||3|-||7j|. The BSs within each cluster cooperatively serve the users associated 
with them, which lowers the system complexity and completely eliminates intra-cluster interference. 
For example, in [4], multi-antenna BSs in each fixed cluster adopt coordinated beamforming to serve 
the single-antenna users in their own cells and avoid the interference to the users served by other 
BSs in the same cluster. In ||3l, ||5l, the authors propose a BS cooperation strategy for fixed clusters, 
including full intra-cluster cooperation to eliminate intra-cluster interference and limited inter-cluster 
coordination to reduce the interference for the cluster edge users based on the per-cluster CSI and 
the CSI of the edge users in the neighboring clusters. In O, Q, the authors consider different types 
of static cluster-based cooperation schemes in a multi-cell system with multiple sectors per cell. 

However, all these works focus on physical layer performance (such as sum throughput, transport 
capacity) in cellular networks. They ignore the bursty data arrivals and assume infinite backlogs of 
packets at the transmitter. In other words, the information flows are assumed to be delay insensitive. 
The resulting control policy is adaptive to the CSI only and it cannot guarantee good delay performance 
for delay-sensitive applications El-El. In practice, a lot of applications have bursty arrivals and they 
are delay-sensitive. It is very important to take into account the delay performance in designing the 
cross-layer interference control algorithms for the coordinated MIMO systems. The control policy 
for delay-sensitive applications should be adaptive to both the CSI and the queue state informatiorll 
(QSI). The motivation can be illustrated by the following example, as illustrated in Fig. [T](a). Under 
cluster-based cooperative or coordinated MIMO, MSs only suffer from inter-cluster interference, as 

^The CSI gives the knowledge about good opportunity to transmit whereas the QSI gives the knowledge about the urgency 
of the data flow. 



March 2, 2013 



DRAFT 



2 



intra-cluster interference is eliminated. Therefore, cluster edge MSs suffer much more interference 
than cluster center MSs. In this work, we are interested to investigating delay-aware BS-discontinuous 
transmission (BS-DTX) control and user scheduling to reduce inter-cluster interference and save 
energy of the whole network. To maximize the sum throughput, the CSI-based BS-DTX control and 
user scheduling always favors cluster center MSs while starves cluster edge MSs. This may lead to 
infinite delay of cluster edge MSs and hence, infinite average delay of all the MSs. However, the QSI 
and CSI based design will dynamically favor different types of MSs to capture the urgency of data 
flows and the good opportunity of channels. Therefore, it can guarantee good delay performance. 
However, the design framework taking into account the queueing delay and the physical layer 
performance is far from trivial as it involves both queuing theory (to model the queuing dynamics) 
and information theory (to model the physical layer dynamics). 

In addition, recent initiatives towards green communications have driven the design of wireless 
infrastructure to be more energy-efficient. One energy-efficient design is to exploit renewable energy 
at BSs. There are many recent works on power management in energy harvesting networks. For 
example, in |[8l, ||9l, the authors extend the Lyapunov optimization framework to derive an efficient 
energy management algorithm for energy harvesting networks. In ifTOl , the authors consider dynamic 
node activation in energy harvesting sensor networks and propose a simple threshold-based node 
activation policy to achieve near-optimal system throughput. Similarly, all these papers have focused 
on physical layer throughput performance and the nodes are powered by renewable energy source 
only with infinite energy storage size. 

In this paper, we consider delay-optimal BS-DTX control and user scheduling algorithm in downlink 
energy harvesting coordinated MIMO systems with limited renewable energy storage. Each BS 
is powered by both conventional grid and renewable power sources. There are various first-order 
technical challenges involved in solving the problem. 

• Renewable and Grid Power Control: The transmit power of a BS comes from both renewable 
and grid power sources, which have very different properties. For instance, the grid power has stable 
power supply but there is cost associated with it. On the other hand, the renewable power is virtually 
free but it has random supply and hence, an energy storage is needed for efficient utilization of 
the renewable energy. In practice, the energy storage has limited capacity and hence, the BS power 
control and user scheduling algorithm should be adaptive to the renewable energy state information 
(ESI) and the data QSI as well as the CSI. It is highly nontrivial to strike a balance between these 
factors in the control algorithm design. 

• Delay-aware Low Complexity Distributed Algorithm: While the delay-optimal control problem 
can be casted into an Markov Decision Process (MDP), brute force solutions such as value iteration 
and policy iteration will suffer from the curse of dimensionality flTll . For example, a very large 
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State space (exponential to the number of users in the network) will be involved. In addition to the 
complexity issue, the solution obtained will be centralized and it requires knowledge of global system 
state information (ESI, QSI, CSI). However, these system state information is usually distributed 
locally at various BSs and huge signaling overhead will be involved in collecting these information. 
Therefore, it is highly desirable to obtain a delay-aware low complexity and distributed algorithm 
with guaranteed delay performance. 

• Performance Analysis: Besides algorithm development, it is important to analyze the system 
performance to understand how it is affected by the renewable energy storage size and the interference 
coupling in cellular networks. One challenge on the system performance analysis is the statistical 
characterization of interference. In [|T2l . the authors study the coverage and rate of cellular networks 
without BS coordination using stochastic geometry lfT3ll . The locations of the BSs are modeled as a 
homogeneous Poisson point process (PPP) and the locations of the mobile stations (MSs) are modeled 
as some independent (of the point process of BSs) point process. The analysis for coordinated MIMO 
network is more challenging due to the asymmetric topology induced by clustering. In addition, the 
analysis becomes more involved when queueing dynamics of data queues and renewable energy 
queues are considered. 

In this paper, considering the limited backhaul capacity and the latency in information exchange 
through backhaul in practical cellular systems [|7l, we adopt cluster-based coordinated^ MIMO to 
eliminate intra-cluster interference. We propose a two-timescale delay-aware BS-DTX control and 
user scheduling for energy harvesting downlink coordinated MIMO systems as illustrated in Fig. 
[T] (a). The BS-DTX control is adaptive to both the ESI and the QSI over a longer timescale. The 
user scheduling is adaptive to the ESI, the QSI and the CSI over a shorter timescale. We show that 
the two-timescale delay-optimal control problem can be modeled as an infinite horizon average cost 
Partially Observed Markov Decision Process (POMDP), which is well-known to be a difficult problem 
|[T4l . By using sample-path analysis and exploiting the specific problem structure, we first obtain some 
structural results on the optimal control policy and derive an equivalent Bellman equation with reduced 
state space. To derive a distributed control policy, we approximate the Q-factor and potential function 
associated with the equivalent Bellman equation by the per-flow functions. The per-flow functions 
are estimated online using distributed stochastic learning at each BS. We prove the almost-sure 
convergence of the proposed distributed algorithm. Furthermore, using queueing theory, stochastic 
geometry and optimization techniques, we characterize the sufficient conditions for data queues in 
the coordinated MIMO networks to be stable. Based on the analysis, we discuss the impacts of the 

^The design framework proposed in this paper does not rely on specific physical layer transmission schemes and can be 
easily extended to cluster-based cooperative MIMO. 
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interference coupling and the size of renewable energy storage on network performance. Finally, 
we compare the proposed algorithm with various baseline schemes and show that significant delay 
performance gain can be achieved. 

II. System Models 

In this section, we shall elaborate on the system architecture, the physical layer model as well as 
the bursty source model for the coordinated MIMO networks. 

A. Architecture of Downlink Distributed MIMO Systems 

We consider a downlink coordinated MIMO system consisting of B multi-antenna BSs and K 
single-antenna MSs as illustrated in Fig. [T](a). Each BS has Nt transmit antennas. Let /C5 denote the 
set of Ki) MS indices associated with the 6-th BS and /C denote the set of K ^ MS indices in 

the network. The set of BSs 5 = {1, • • • , 5} are partitioned into N = B/Nf coordination clusterl^ 
i.e., B = U^^iBn and Bn H Bn^ = Vn 7^ n\ where Bn denotes the set of Bn BSs in cluster n. Each 
coordination cluster contains Nt neighboring BSs and is managed by a cluster manager (CM) and 
all the N CMs are managed by a BS controller (BSC). The BSs in the same cluster share the CSI 
and perform coordinated beamforming [H] to combat intra-cluster interference. Besides conventional 
grid power source, each BS is able to harvest energy from the environment, e.g., using solar panels 
ifTSi . At each BS, there is a renewable energy queue (battery) with limited capacity for storing the 
harvested energy. In addition, at each BS, there are multiple data queues for buffering the packets to 
all the MSs associated with the BS (one queue for each MS) as illustrated in Fig. [I](a). 

B. Physical Layer Model 

Let 5 G H and Lj^^i^ denote the NfXl complex small-scale fading vector and the long-term path 
gain between the 6-th BS and the k-th MS, where H C C^*^^ denotes the finite discrete complex CSI 
state space. Let = {h^^^ : k e ICt.b e Bn} e = U^^^^- and H = U^^^H^ e H = 
denote the intra-cluster CSI at n-th CM and the aggregation of the CSI over N clusters, respectively. 
In this paper, the time dimension is partitioned into scheduling slots indexed by t with slot duration 
r (second). 

Assumption 1 (Quasi-static Fading): h.k,h{t) is quasi-static in each scheduling slot for all {k,h) E 
IC X B. Furthermore, each element of vector hk^b{t) follows a general distribution with mean and 
vairiance 1. The distribution of each element of vector h]^^b{t) is i.i.d. over scheduling slots and 
independent w.r.t. {/c, b}. The long-term path gain Lj^^^ remains constant for the duration of the 
communication session. ■ 

^For simplicity, we assume 5 is a multiple of Nt. 
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We assume all the BSs in the system share a common spectrum. Let P6 G P = {0, 1} denote the 
binary BS-DTX control action of the 6-th BS, where = 1 indicates the 6-th BS is active and Pb = 
otherwise. Between the coordination clusters, the inter-cluster interference is managed by a binary 
BS-DTX control action p = {pt : Pb ^ V,b e B} e V, where V C is the aggregate BS-DTX 
control action space and specifies the BS-DTX patterns lfT6l . Since each BS has renewable and grid 
power sources, we have J>6 = + pf, where pf^V and pfeV denote the power contribution 
from the renewable power and grid power sources of the 6-th BS, respectively. Let 5/^ G 5 = {0, 1} 
denote the user scheduling action of the k-th MS, where = I indicates the k-th MS is selected to 
receive packets and 5/^ = otherwise. Thus, users are selected according to a user scheduling action 
s = {sk : Sk ^ S^k e IC} ^ S, where S C is the aggregate user scheduling action space. The 
BS-DTX control and user scheduling are performed according to a control policy to be defined in 
Definition [B 

In each slot, each active BS selects one MS to serve. Within each coordination cluster, the active 
BSs combat the intra-cluster interference using coordinated beamforming HI, [El, flU. Let P5 and 
denote the instantaneous transmit power of the 6-th BS and the information symbols for the k-th MS, 
respectively. The received signal at the k-th MS of the 6-th cell in the n-th cluster is given by 

yk^Ph\/^h\/^^h^k,h^k,hSk^^ ^ Ph'\fPh'\/Lk^h'^l^h' ^ ^k'M^k'Xk' 



desired signal ^ 



intra-cluster interference 

+ ^ ^ pi,'^fP^'^fL^>\il^^, ^ ^k'.h'Sk'Xk' + Zk , k elCb.beBn 

" V ' 

inter-cluster interference 

where ^ CJ\f{O^No) is the AWGN noise and wj^^t ^ C^*^^ is the zero-forcing beamforming 
weight for the fc-th MS at the 6-th BS. Specifically, {v^/c,^} is given by the solutiorQ of the zero- 
forcing problem: J2ke}C, W'^k^bW'^Sk = Pb and Skh^^^, (j2k'e}C,, ^k^,b'Sk^^ = (V6' G Bn,b' + 6). 
The receive SINR at the fc-th MS of the 6-th cell in the 77/-th cluster is given by 

prx 

Pfe(H,p,s) ^ ^ kelCt^beBn (1) 

^Vo + -Lk 

where the receive power P^^ and the inter-cluster interference power Ij^ are given by 

= PbPbLk^bWhib'^k^bW^sk (2) 

4 = ^ X] Pb'Pb'Lk^b'\\i^k,b'{ ^ Wfc/^5/5/e/)|P (3) 
n'^nb'eB^f k' ^Ky 

We have the following assumption regarding packet transmission. 

"^If there are more than one solutions, we choose the one maximizes | |h^^w/c,b| 
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Assumption 2 (Packet Transmission Model): One data packet with certain fixed packet size can be 
successfully received by the k-th MS if the receive SINR pk exceeds a certain thresholdy 6k, i.e., 
Pk ^ ^k' There exists a state-action pair (H, p, s) G x P x <S, such that Pr[p/e(H, p, s) > S^] > 0. 



C. Bursty Source Model and Queue Dynamics 

Let A^{t) = {A^{t) : k e IC} and A^{t) = {Af (t) : 6 G be the number of packets arriving 



to the K MSs and the number of renewable energy uniti 



arriving to the S BSs at the end of the 



t-th scheduling slot, respectively. We have the following as sumption j^l regarding the bursty data and 
renewable energy arrival processes. 

Assumption 3 (Bursty Data Source Model): The arrival process A^{t) is i.i.d. over scheduling 
slots and independent w.r.t. k according to a general distribution ) with average arrival rate 

E[A^{t)] = < 1. The statistics of A^{t) is unknown to the controller. ■ 
Assumption 4 (Bursty Renewable Energy Model): The arrival process Af{t) is i.i.d. over schedul- 
ing slots and independent w.r.t. b according to a general distribution Pa^{') with average arrival rate 
E[Af (t)] = Af < 1. The statistics of Af(t) is unknown to the controller. ■ 

Remark 1 (Interpretation of Assumption^: AssumptionlHimplies that the renewable power source 
is stationary. Although the renewable energy source is not stationary over a very long time horizon 
in practice, it is stationary over a typical communication session, which lasts for less than 30 mins. ■ 

Let Q^(t) = {Qk{t) ' k G ICn} e Qn = Q^^^^r.^^ be the n-th cluster QSI and Q{t) = 
U^^iQn(^) ^ Q — be the aggregation of the QSI over N clusters at the beginning of the t-th 
slot, where Qk{t) ^ Q — {^Ai ' ' ' i Nq} denotes the number of data packets at the data queue for 
the k-th MS and Nq denotes the data buffer size. At slot t, there is I[pk{t) > Sk] G {0, 1} packet 
successfully received at the k-th MS, where I[ ] denotes the indicator function. Hence, the data queue 
dynamics of the k-th MS is given by 

g,(t+l)=min{[g,(t)-I[p,(t)>4]]^ + ^?(t), keIC (4) 

^In general, we allow different MSs with different packet sizes, and hence the threshold is indexed by k and may be 
different for different MSs. 

^One unit of energy for the b-th BS corresponds to the amount of energy consumed in downlink transmission at each 
slot for the b-ih BS, i.e. Ptr Joule. Note that the instantaneous transmit power from the renewable power source is finite 
(i.e., Pb). The notion "unit of energy" can be easily extended from binary (on-off) power control to handle (multi-level) 
power control. 

^Note that under Assumption [3] and Assumption g] we have Fi[A^{t) = 0] > and Pr[Af (t) = 0] > for all A: G /C 
and b e B, respectively. 
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where Pk{t) = (H(t), p(t), s(t)) and x+ = max{x,0}. 

Similarly, let E^(t) = {Eb{t) : b e Bn} e Sn = be the n-th cluster ESI and E(t) = 
U^^i'En(t) e £ = £^ be the aggregation of the ESI over N clusters at the beginning of the t-th 
slot, where ^^^(t) G £^ = {0, 1, • • • , Ne} denotes the number of renewable energy units in the energy 
queue for the 6-th BS and Ne denotes the energy storage size. At slot t, pf{t) G V unit of renewable 
energy is consumed from the 6-th energy queue for packet transmission. Hence, the energy queue 
dynamics of the 6-th BS is given by 

Ei,{t + 1) = min { [E,{t)-p^{t)Y + Af{t), Ne], beB (5) 

D. BS-DTX Control and User Scheduling Policy 

For notation convenience, we denote = (E(t), Q(t), H(t)) G A' = 5 x Q x as the global 
system state at the t-th slot. We first define the centralized control policy. Specifically, at the beginning 
of each slot, the controller determines the renewable power DTX control action = {p^ : p^ G 
7^, 6 G >B} G P, grid power DTX control action = {pf : G 6 G 0} G P as well as the user 
scheduling action s = {s^ : 5/^ G 5, /c G /C} G iS based on the global system state according to 
the control policy defined below. 

Definition 1 (BS-DTX Control and User Scheduling Policy): A BS-DTX control and user schedul- 
ing policy consists of a sequence of mappings tt = • • • }. The mapping for the t-th slot 
^) is a mapping from the system state G AT to the renewable power DTX 
control action '^(E(t), Q(t)) = p^(t) G P, the grid power DTX control action Jlp '*(E(t), Q(t)) = 
p^(t) G V and the user scheduling action = s(t) G <S. A policy tt is called feasible if for 
all t, the following constraints are satisfied: 

1) (^) = if Ei,{t) = for all 6 G ;B (no renewable energy available for transmission). 

2) pi,{t) = p^{t) +pf{t) G V for all 6 G e (binary BS-DTX control). 

3) J^keJCb '^^(^) ^ P^(^) all b e B (each active BS selects one MS in its cell). ■ 

Remark 2 (Motivation of Two-Timescale Control Policy): The two-timescale control is a constraint 
we impose due to the following practical reasons. The QSI and ESI are changing on a longer timescale 
(e.g., several slots) while the CSI is changing on a shorter timescale (e.g., per-slot). The BS-DTX 
control is usually implemented at the BSC for interference reduction and energy saving of the whole 
network. As a result, the BS-DTX control cannot afford to be running on a per-slot basis, due to the 
high complexity and signaling overhead in collecting the local CSI from all the BSs. Therefore, it 
is desirable to make it a function of the ESI and QSI only. On the other hand, the low complexity 
distributed user scheduling is implemented locally at each CM (similar to HSDPA in current 3G 
networks) and they can afford to run on a per-slot basis and adapt to the ESI, QSI and CSI. ■ 
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III. Problem Formulation and Optimal Solution 

In this section, we shall first elaborate on the dynamics of the system state under a control policy 
TT. Based on that, we shall formulate the delay-optimal control problem and derive some structural 
properties for the optimal solution. 



A. Delay-Optimal Problem Formulation 

Under Assumptions [T] [3] and IH the induced random process for a given feasible control 

policy TT = {fi^, fi^, • • • } is a Markov chain with the following transition probability 

Pr[x(t + l)|x(t),f2*(x(t))] 

= Pr[H(t + l)\x{t)M{xm Pr[E(t + Pr[Q(t + l)|x(t), fi*(x(t))] 

= Pr[H(t + 1)] Pr[E(t + ^'{xit))] Pr[Q(t + (6) 

As a result, given a feasible control policy tt, the average delay cost per stage of the fc-th MS 
starting from a given initial state %(!) is given by 



1. 



t=i 



(V) 



where the expectation is taken w.r.t. the measure induced by the policy tt and f{Qk) is a monotonic 
increasing utility function of Qk- For example, with f{Qk) = ^ and f{Qk) = MQk ^ Qk\ (Qk ^ 
{0, • • • , Nq}), d?]) can be used to measure the average delay and the average queue outage probability 
of the k-th MS under policy tt. Similarly, given a feasible control policy tt, the average grid power 
cost per stage of the 6-th BS starting from a given initial state %(!) is given by 



1 



.t=i 



(8) 



We are interested in minimizing the average delay cost of each MS A: G /C in d?]) and the average 
grid power cost of each BS b e B in A Pareto optimal tradeoff on the average delay and average 
grid power consumption can be obtained by solving the following problem. 

Problem 1 (Two-Timescale Delay-Optimal Control): For some positive constants /3 = {^k > : 
k e IC} and 7 = {75 > : 6 G B}, the delay-optimal problem is formulated as 

T 



t=l 



min4^'^)(x(l)) = +5^7.P?,,(X(1)) =^lim -E- 

keJC beB 

where g{x{t), n\x{t))) = J2keJC ^kf{Qk{t)) + J^beB ^bpf{t) and the control policy tt satisfies the 
two-timescale requirement in Definition [T] ■ 



(9) 



Remark 3 (Two-Timescale Control and POMDP): By two-timescale requirement, the BS-DTX con- 
trol policy is defined on the partial system state (E, Q), while the user scheduling policy is defined on 
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the complete system state x = (E, Q, H). Due to the two-timescale control constraint as in Definition 
[B Problem [U is a POMDlfel. ■ 



B. Policy and State Space Reduction 

Problem [T] belongs to POMDP, which is well-known to be a challenging problem in general. Yet, 
we shall exploit some special structures in our problems to reduce the policy and state spaces. Based 
on that, we can simplify the POMDP. We first have the following lemma on the structural property 
of the BS-DTX control, which helps to reduce the policy space. 

Lemma 1 (Structure of Optimal BS-DTX Control): Let the BS-DTX control for the t-th slot be 
denoted by : £ x Q ^ V, which is a mapping from the partial system state (E, Q) E 5 x Q 
to the BS-DTX control action Jl|,(E(t), Q(t)) = p{t) G V. Conditioned on any $1^^, the optimal 
^3 and n^;^ satisfy n^;^{E{t)M{t)) = nl,m)Mit))imt) > 0] and n^;^{E{t)M{t)) = 
nl i^(E(t), Q{t))I[Et(t) = 0] for all 6 G >B and all t. ■ 
Proof: Please refer to Appendix A. ■ 

Remark 4 (Interpretation of Lemma \T}: Lemma [T] indicates that we are inclined to consume re- 
newable power first. This is because renewable power is free while grid power has cost. In addition, 
due to the limited energy storage size, we may suffer from renewable energy loss when the energy 
queue size is large. Therefore, it is preferable to keep the size of the energy queue small. ■ 

Based on Lemma [B without loss of optimality, we can first solve Problem [T] over a reduced policy 
TV = {fi^, fi^, • • • }, where = (Qp, ^1) and then obtain the optimal ftp'^ and Qp'^ from the optimal 
using Lemma [T] 

Next, we exploit the i.i.d. property of the CSI to reduce the state space. We first define partitioned 
actions below: 

Definition 2 (Partitioned Actions): Given = we define 

n\E, Q) = {(p, s) = {nliE, Q), f2* (E, Q, H)) : H e H}, f2* (E, Q) = {s = f2* (E, Q, H) : H e ^} 

as the collection of actions (p, s) and s for all possible CSI H conditioned on a given ESI and QSI pair 
(E, Q). and Ql are therefore equal to the union of all partitioned actions, i.e. Q = [j^^ f](E, Q) 

and = U(e,q) ^^(E, Q). ■ 

Based on Lemma [T] and Definition O the optimal control policy in Problem [T] can be obtained 
by solving an equivalent Bellman equation over a reduced state space, which is summarized in the 
lemma below. 

^POMDP is an extension of MDP when the control agent does not have direct observation of the entire system state. 
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Lemma 2 (Equivalent Bellman Equation for POMDP): The optimal control policy for Problem [T] 
can be obtained by solving the following equivalent Bellman equation w.r.t. (6>, {1/^(E, Q)}): 

e + ViE, Q) = mill {^((E, Q), f^(E, Q)) + V Pr[(E', QO|(E, Q), 0(E, Q)]l^(E', Q')| 



(E',Q') 



V(E,Q)g5xQ (10) 



where ^((E, Q), !^(E, Q)) = EkeJcl^kfiQk) + Efoes 76!^p(E, Q)I[^6 = 0] is the per-stage cost 
function, Pr[(E', Q')|(E, Q), f2(E, Q)] = IE[Pr[(E', Q')|x,f2(x)]|(E, Q)] is the transition kernel. 
is the optimal value for all x, i.e., — min^r Ji^'^\x) Vx G A- and {y(E,Q)} is called 
the potential function. Furthermore, if fi*(E, Q) = (fi*(E, Q), fi*(E, Q)) attains the minimum of 
the R.H.S. of Cni) for all (E, Q) G 5 x Q, the stationary policy f^* = {^1,^1) is optimal (i.e., 

Proof: Please refer to the Appendix B. ■ 
Remark 5 (Interpretation of Equivalent Bellman Equation): The equivalent Bellman equation in 
([TOl) is defined on the reduced space of the ESI and QSI (E, Q) only. Nevertheless, by solving ([TOl) , 
we can obtain a stationary BS-DTX policy fi*, which is a function of (ESI, QSI), and a stationary 
user scheduling policy fi*, which is a function of (ESI, QSI, CSI). ■ 

C. Centralized Optimal BS-DTX Control and User Scheduling 

To facilitate the BS-DTX control, which is only adaptive to the ESI and the QSI, we introduce the 
BS-DTX control Q-factor Q(E, Q,p) w.r.t. the BS-DTX control action p. Based on Lemma O we 
summarize the optimal BS-DTX control in the following corollary. 

Corollary 1 (Optimal BS-DTX Control): The optimal BS-DTX control is given by 

f^!(E,Q) = argminQ(E,Q,p), V(E, Q) G 5 x Q (11) 

where Q(E, Q,p) is the BS-DTX control Q-factor given by the following Bellman equation w.r.t. 
(0,{Q(E,Q,p)}): 

e + Q(E, Q, p) V(E, Q) G 5 X Q, p G P (12) 

= min |^((E,Q),p,f^,(E,Q))+ V Pr[(E^ Q0|(E, Q), p, f^,(E, Q)] min Q(E^ Q^ p^j 

■ 

Proof: Please refer to Appendix B. ■ 
As the distributions of the energy and data arrival processes are unknown to the controllers, we 
introduce the post-decision state potential function ?7(E, Q) to determine the user selection [fTTl . The 
post-decision state (E, Q) is defined to be the virtual partial system state immediately after making 
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an action before the new renewable energy and data arrivelj. Based on Lemma [2l we summarize the 
optimal user scheduling in the following corollary. 

Corollary 2 (Optimal User Scheduling): The optimal user scheduling is given by 

^tix) = arg^Jgin { E ( n - - (-l)"'^ Pr[pfc(H,p*,s) > Sk])u{[E - p*]+, [Q - d]" 



Vx G A- (13) 



where p* = f^*(E, Q) is the optimal BS-DTX control action given by <5(p) = {s E 5 : 
J^keJCb ^ Pb^b e B} denotes the feasible user scheduling action space under the BS-DTX control 
action p, 4 g P = {0, 1}, and d = {4 G P/e : A: G /C} G X> = . U(E, Q) is the post-decision 
potential function given by the following Bellman equation w.r.t. (0, {U(E^ Q)}) II3- 

+ t/(E,Q) V(E,Q) g5xQ (14) 

= Pr[A^]Pr[A^] min {^((E, Q), f^(E, Q)) + Pr[(E^ Q0|(E, Q), f^(E, Q)]t/(E^ Q0| 

where E = min{E + A^, Ne} and Q = min{Q + A^, Nq}. ■ 
Proof: Please refer to Appendix B. ■ 
Remark 6 (Complexity of Centralized Delay-Optimal Solution): The complexity of obtaining the 
original Q-factor and the associated BS-DTX control is 0{{Ne + 1)^{Nq + 1)^2^) . The complexity 
of obtaining the original post-decision state potential function and the associated user scheduling is 

IV. Low Complexity Delay-aware Distributed Solution 

Obtaining the optimal control in (fTTI) and ([T3l) has exponential complexity and requires centralized 
implementation at the BSC and knowledge of the aggregation of the ESI, QSI and CSI, which 
leads to huge signaling overhead. In this section, we shall first introduce a randomized base policy. 
Based on that, we shall propose a low complexity distributed deterministic policy using approximate 
dynamic programming [fTTI . We shall show that the proposed solution has better performance than 
the randomized base policy. 

^For example, x = (E, Q, H) is the state at the beginning of some slot (also called the pre-decision state) and making 
an action (p, s) = ^(x) leads to p = {pk : k ^ JC} with pk given by ([T]). Then, the post-decision state immediately after 
the action is x = (E, Q, H), where E = [E - p]+ and Q = [Q - I[p ^ S]] ^, where S = {Sk : k e JC}. If new arrivals 
and occur in the post-decision state, and the CSI changes to then the system reaches the next actual state, 
i.e., pre-decision state = (min{E + A^, Ne}, min{Q + A^,Nq}, H'). 
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A. Randomized Base Policy 

We first introduce a randomized base policy and discuss an important structural property of the 
equivalent Bellman equations in ([T2l) and ([T4l) ) under this base policy. 

Definition 3 (Randomized Base Policy): A randomized base policy is denoted as = {Ctp^Ctg). 
The randomized base policy for BS-DTX control Clp is given by a distribution on the action space 
of p, i.e., V. The randomized base policy for user scheduling Clg is given by a mapping from the 
CSI H to a probability distribution ^^^(H) on the action space of s, i.e., S. ■ 

Under a randomized base policy, the corresponding Q-factor and post-decision potential function 
have the following decomposition structure. 

Lemma 3 (Decomposition under Randomized Base Policy): Given any randomized base policy Ct, 
the Q-factor Q(E, Q, p) and the potential function J7(E, Q) associated with the equivalent Bellman 
equations in ([T2l) and ([T4l) can be expressed as: Q(E,Q,p) = XlfeG-B S/ceyCb ^^(^^' P) 
C/(E, Q) = Y.beB EkeK, MEb, Qk), where 



ek + Qk{Eb,Qk,p) 



(15) 



^9k{Eb,Qk,Pb)+ P^[{El,Q'k)\{Eh,Qk)M^''''[Qk{EiQlp')] 



Gk + Uk{Eb, Qk) 



(16) 



J2 PT[Af]FT[A'^]lE^^[gkiEh,Qk,Pb)]+ \P^[{E'b,Q'k)m,Qk),p]\VkiB'^,Qi 



with gk{Eb, Qk,Pb) = PkfiQk) + lbPbl[Eb = 0]E Pv[sk = 1|H] and Pr [(E^, Q'k)\{Eb, Qk),P 



E 



[FT[{El,Q',)\{Eb,Qk,il),Pb,Sk]\il] 
Proof: Please refer to Appendix C. 



B. Low Complexity Delay-aware Distributed Solution 

Based on the randomized base policy Ct, we shall obtain a low complexity distributed deterministic 
policy ^2* by Q-factor and potential function approximation. The solution is elaborated below. 

1 ) BS-DTX Control Policy Over a Longer Timescale: To reduce the complexity and to facilitate 
distributed implementation, we approximate the BS-DTX control Q-factor Q(E, Q,p) in ([T2l) by 
Q(E,Q,p), i.e., 

Q(E, Q, p) ^ Q(E, Q, P) = 5^ 5^ Qk{E,, Qk, p) (17) 
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where Q/c(£^6,Q/c) is given by the per-flow fixed point equation in ([T5l) . The BSC determines the 
BS-DTX control based on the aggregation of the ESI and QSI according to 

p* (E, Q) = arg min V V Qk{Ei,, Q^, p) (18) 
pev ^ — ' ^ — ' 

Remark 7 (Complexity of the BS-DTX Control): Under the linear Q-factor approximation in (ITTt , 
the complexity of obtaining the BS-DTX control is reduced from 0{(Ne + 1)^{Nq + 1)^2^) to 
0{{Ne + l)(^g + 1)2^K). To further reduce the complexity w.r.t. B, we can partition the BSs into 
macro-groups with size Nb- The BS-DTX control in (fTSl) can be done for each of the macro- 
groups separately [fT6l . In practice, Nb <^ B and hence, the complexity becomes 0{{Ne + 1)(^Q + 
1)2^^ -§^K), which is linear w.r.t. B. ■ 

2) Distributed User Scheduling Policy at the CM Over a Shorter Timescale: To reduce the com- 
plexity and to facilitate distributed implementation of the user scheduling, we approximate the post- 
decision state potential function U(E, Q) in ([T4l) by C/(E, Q), i.e., 

C/(E, Q) « J7(E, Q) = J2J2 (19) 

where Uk{Ei,^ Qk) is given by the per-flow fixed point equation in ([T6]) . Substituting the approximation 
in ([T9l) into the optimal user scheduling in ([T3l) , the user scheduling solution under the approximation 
is summarized below. 

Lemma 4 (Distributed User Scheduling): Under the linear potential function approximation in ([19), 
the distributed user scheduling action s* of the n-th cluster based on the per-cluster ESI, QSI and 
CSI under p*(E, Q) obtained by ([H]) is given by 

s;(E^,Q^,H^), VE^ G SnMn ^ Qn Un ^ Un^^U (20) 

= arg max V SkViipki'Rn.V^^n) > {M[Eh - fb\^ 

where 5,(p,) ^ {s, G 5^^^-^ : Y.k^^, Sk = Pb,b e Bn] denotes the feasible user scheduling 
action space of cluster n under the BS-DTX control action p^. ■ 
Proof: Please refer to Appendix D. ■ 
Remark 8 (Complexity of the User Scheduling): The user scheduling action in (l20l) is a function of 
the per-cluster ESI, QSI and CSI, and is computed locally at the n-th CM. Under the linear potential 
function approximation in ([19]), the complexity of user scheduling is reduced from 0(^{Ne+1)^ {Nq + 
1)^) to 0{{Ne + 1){Nq + 1)K). ■ 

C. Performance of Low Complexity Delay-aware Distributed Solution 

The key motivation of the linear approximatios of the Q-function and potential function in ([TTl) 
and ([T9]) is to facilitate distributed control. The following theorem shows that the proposed distributed 
policy always achieves better performance than the randomized base policy. 



March 2, 2013 



DRAFT 



14 



Theorem 1 (Performance Improvement): If Pr[(E', QO|(E, Q), (p, s)] 7^ Pr[(E', QO|(E, Q), (p', s')] 
for any (p, s) ^ (p^ and (E, Q) G 5 x Q, then we have (9* (E, Q) < ^ for all (E, Q) G 5 x Q, 
where ^*(E, Q) is the average cost under the proposed solution starting from state (E, Q) and 9 is 
the average cost under any randomized base policy, respectively. ■ 
Proof: Please refer to Appendix E. ■ 

V. Distributed Online Learning via Stochastic Approximation 

Observe that the BS-DTX control and the user scheduling in (ITSl) and (1201) require the knowledge 
of {Q/c(£^65 Q/c5 p)} and {Uk{Ei),Qk)}, respectively, which are defined in the fixed point equations 
in ([TSl) and ([T6l) , respectively. However, solving these fixed point equations is also quite challenging. 
In this section, we shall propose an online distributed stochastic learning ifTSl algorithm to estimate 
{Q,k{EhiQkiV)} and {Uk{Ei),Qk)} using the per-cluster system state information only. We shall 
prove that the proposed distributed algorithm converges almost surely to the fixed point solutions. 

A. Distributed Online Learning for {Qk{E!i„Qk,p)} and {Uk{Eh^Qk)} 

Since the statistics of A^{t) and A^(t) are unknown to the controller, instead of computing 
{Qk{Eh^Qk^v)} and {Uk{Eh,Qk)} of a chosen offline, we shall estimate them distributively at 
each BS based on the instantaneous observations. 

Algorithm 1: {Online Per-User Q-f actor and Potential Function Learning Algorithm) 

• Step 1 [Initialization at the BSs]: Set t = 0. Each BS h initializes {Q^(^6, Q/c, p)} and 
{Ut^h.Qk)} for all A;G/C5. 

• Step 2 [BS-DTX Control at the BSC]: At the beginning of the t-th slot, each BS 6 re- 
ports |E/cGyc, ^k (Eb{t), Qk{t),p) : p G P| to the BSC. The BSC determines BS-DTX control 
p*(t) 4 p*(E(t),Q(t)) according to ^ and broadcasts p*(t) to all the CMs. Each CM n 
informs pl{t) to each BS 6 G Bn- Each BS b determines its renewable and grid power allocations, 
i.e., j3f = pl{t)I[Et{t) > 0] and pf%t) = pl{t)I[Et{t) = 0], respectively. 

• Step 3 [User Scheduling at the CMs]: Each BS b reports |c/^ {Ei,{t), Qk{t)) : A: G /C^j to its 
CM. Each CM n determines user selection s*(t) = s* (E^(t), Q^(t), H^(t)) according to (1201) 
under given BS-DTX control p*(t). 

• Step 4 [Per-flow Q-faetor and Potential Function Update at the BSs]: Based on the current 
observations Af{t) and A2{t) (k G /C5), each BS b updates the per-flow Q-factor and potential 
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function for the MSs in its cell according to (|2T]l and (|22l l for all k e JCt,. 

Ql^\Eb, Qk, p) eS,QkeQ,pEV 
=Ql{Eh, Qk,p) + et [FkiQl, Eb, Qk, p) - Fk{Ql, Ei, Qi,p') - Qi(Eb, Qk, p) 

Ul+\Eb,Qk) yEi,eS,QkeQ 
=UliEb, Qk) + et [Tkiijl, Eb, Qk) - TkilJl E^, Qi) - UHEb, Qk] 
where 

Fk{QlEb,Qk,p)^gkiEh,Qk,Pb)+ J2 M{ElQk)\iEh,Qk),p] 

X E^" [Ql(min{4 + (t), Ne}, min{Q; + Nq}, p'] 

T*(U* , Eb, Qk) = [gk (mm{Eb + (t), Ne}, min{4 + A'^ (t) , Nq} , p^J 
+ ^ E^" 



(21) 



(22) 



(23) 



Pr 



(^^,g',)| (min{^fc + Af(t),iVE},min{Qfc + ^^(t),iVQ}) ,p]] UHE^Q'^ 



(24) 



p^ is the reference BS-DTX control 



E'^ = [£^6 = [Qfc - I[pfc(H,,P,s,) > 4]]- 

[min{4 + A^(t),7Vg}-I[pfc(H,,p,s,) >4,j 

action and Q^, El, are the reference statesHj for the Q-factor update in (1211) and the 
potential function update in (1221) . respectively, {et} are diminishing positive step size sequences 
satisfying the following conditions: > 0, J2t = J2t ^| < co. 



B. Performance of the Distributed Learning Algorithm 
The convergence of Algorithm [T] is summarized below. 

Lemma 5 (Convergence of Algorithm\l^: The iterative updates of the per- flow Q-factor and the 
per-flow potential function in (1211) and (1221) converge almost surely, i.e., lim^^oo ^ 
lim^^oo = a.s. (V/c G /C), where and are the solutions of the fixed point equations 
in ([T5l) and ([T6l) , respectively. ■ 
Proof: Please refer to Appendix F. ■ 
Remark 9 (Signaling Requirement of Distributed Two-Timescale AlgorithmU}: 
• Signaling requirement over a short timescale (per slot): Each BS needs to collect the local 
CSI over the radio interface. The BSs within a cluster also need to report the local CSI to its CM. 

^^The reference action and states are used to bootstrap the online learning algorithms [[T9l for (l2ll and (l22t respectively. 
Without loss of generality, we set = 0, Qi = 0, = {pi = 1 : b e B}, = and Qi = 0. 
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Yet, the signaling loading and the latency requirement for this part is in fact similar to the existing 
HSDPA and LTE systems. 

• Signaling requirement through the backhaul over a long timescale (in convergent stage): 

Each BS needs to report the Q-factors of the (updated) local QSI to the BSC (for the BS-DTX control) 
as well as the potential functions of the (updated) local QSI to the CM (for the user scheduling within 
a cluster). These signaling exchanges are over the high-speed backhaul and over a longer timescale 
(not on a slot by slot basis). The latency of signaling over backhaul (typically less than 10ms) is 
negligible. ■ 

VL Stability Analysis 

In this section, we shall analyze the stability conditions for the data queues in the coordinated 
MIMO networks with infinite data buffer size (Nq = oc) and finite energy storage size (Ne < oc), 
and discuss various design insights. We have the following assumption on the BS and MS distributions. 

Assumption 5 (BS and MS Distributions): The location of the BSs follows a homogeneous Poisson 
Point Process (PPP) $ of density A and the location of the MSs follows some independent stationary 
point process in the Euclidean plane lfT2]| . lfT3l . Each MS is associated with the closest BS, i.e., the 
MSs in the Voronoi cell of a BS are associated with it. ■ 
To simplify the analysis, we consider a homogeneous network with = 1, P5 = P V6 G and 
= 5 V/c G /C. In addition, we assume the CSI follows complex Gaussian fading and the long- 
term path gain follows standard power law Lj^^b = ^kb' where rk^b is the distance between BS b 
and MS k and a > 2 is the path loss exponent. Furthermore, the renewable energy and bursty data 
arrivals under Assumptions |3]and|4]are specialized to Bernoulli processes, i.e., A^(t)^A^(t) G {0, 1}, 
E[A^{t)] = < 1 and E[Af (t)] = A^ < 1 for all /c G /C and 6 G 0. We consider the following 
randomized BS-DTX policy. 

Definition 4 (Randomized BS-DTX control Policy): At each slot t, each BS 6 G is active with 
probability ptx > 0, i.e., PT[pb{t) = 1] = Ptx, if J^keJCt Qk{t) > 0; Pb{t) = otherwise. ■ 
In the following, we shall analyze the sufficient conditions for the queue stability (i.e., Qk{t) having 
a steady state limiting distribution for t ^ oc ll20ll ) under the randomized policy in Definition |4] of 
a randomly chosen user. 

A. Stability Analysis for Systems without BS Coordination (Nt = I) 

In this case, we consider no cooperation among BSs (Nt = 1). Using stochastic geometry lfT3ll and 
the technique of parallel dominant queues 11211 . Il22l . the following lemma summarizes the sufficient 
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condition for the queue stabiUty of a randomly chosen MS at a distance ri from its BS0. 

Lemma 6 (Sufficient Condition for Queue Stability without BS Coordination): The data queue of 
a randomly chosen MS is stable if 

A« < pt, exp (^-Cm.^ - = Al,(ptx, Nt) (25) 

In addition, Xma^iptx^ ^t) corresponds to the maximum average grid power cost per BS p^8,^{ptxi Ne) = 
(1 - f{ptx,NE))ptx- Nt = 1, Ci = ^TvrlS and 



fiPtx.NE) 



^ (A^M.)(l-(A^M.)^^) ^ 



(26) 



Proof: Please refer to Appendix G. ■ 
Remark 10 (Interpretation of Lemma ^: /{pix^Ne) can be interpreted as the probability that a 
energy queue is non-empty in a parallel dominant network 1^, i.e., f{ptx^NE) = P^f^fe > 0]. It can 
be easily verified from (l26l) that /{pixiNe) increases as Ne increases and Huin^^oo f{ptxi Ne) = 
min{^, 1} = f{ptx, oc), which corresponds to the case with infinite energy storage size. In addition. 



B. Stability Analysis for Systems with BS Coordination (Nt > 1) 

In this part, we extend the analysis to the case with BS coordination (Nt > 1). For a randomly 
chosen MS, the interference comes from the active BSs outside its cluster. Hence, we need to consider 
the distribution of the coordination clusters and the associated analysis is more challenging compared 
with the case without BS coordination (Nt 1) lEl. 

Lemma 7 (Sufficient Condition for Queue Stability with BS Coordination): For Nt > the data 
queue of a randomly chosen MS in the coordinated MIMO network can be stabilized if 

A« < pt, exp (^-CN.PtxX - = Agax(Pix, Nt) (27) 

In addition, Xme^xiPtx, Nt) corresponds to the maximum average grid power cost per BS p^axiPtx, Ne) - 
(1 - fiptx,NE))ptcc. Cn, = ^7rr2<5min{l,rr2M^Jl£^^} = OiN''^) as iV^ ^ ^. ■ 
Proof: Please refer to Appendix H. ■ 

^^When Ne oo and ptx = 1, the result in (l25l l reduces to the coverage probability for cellular networks without BS 
coordination obtained in |12|. 

^^In the parallel dominant network, dummy packets are transmitted if a data queue is empty. Thus, the BS-sDTX controls 
are decoupled from the data queues, i.e., independent of the QSI, and hence, the renewable and grid power consumptions 
are symmetric across all the BSs. 

^^Note that Nt > ^ implies A^t > 1 for most of the cases we are interested in, as wu usually have 2 < a < 4 in 
practical systems. 
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C. Optimization of Randomized Policy 

We are interested in maximizing Amax(Pte5 Nt) under grid power constraint w.r.t. the parameter 
Ptx in the randomized control poHcy for any given Ne > and Nf > 1. Specifically, we have 

Ptx{]^E, Nt) = arg max Agaxbtx,A^t) (28) 

PtxG[0,l] 



s-t. pg,^{pt.,NE)<P'^ 
Let Amax(^£;, Nt) = Amax(j>to5 ^0 dcnotc the optimal value of the optimization problem in (l28l) . Let 



x*(A^^) denote the solution to Pmax(^5^£^) ^ ^ny given Ne > 0. The following theorem 

summarizes the optimal solution. 

Theorem 2 (Optimization Solution for Queue Stability): p^^{Ne^ Nt) — min{x*(A^^), 1, q^-j}- For 
any given Nt > 1, Xml^{NE, Nt) is strictly increasing in Ne if x''{Ne) < min{l, ^r^} and is a 
constant for all Ne if x''{Ne) > min{l, c^}- For any given Ne > 0, Xml^{NE,Nt) is strictly 
increasing in A^f ■ 
Proof: Please refer to Appendix L ■ 

VIL Results and Discussions 

In this section, we shall discuss the design insights from the analytical results in Section |VT1 We 
also compare the delay performance gain of the proposed delay-aware low complexity distributed 
scheme in Section |lVl and Section |Vl with the following two baseline schemes using simulation. 

• Baseline 1 [CSI-based Single Cell Scheme]: Baseline 1 refers to the randomized BS-DTX 
control and CSI-based user scheduling without BS coordination. Each multi-antenna BS uses maximal 
ratio combining (MRC) and selects one MS with the maximum successful packet transmission 
probability based on the observed local CSI. 

• Baseline 2 [CSI-based Clustered Coordinated MIMO Scheme]: Baseline 2 refers to the 
randomized BS-DTX control and CSI-based clustered coordinated MIMO with the same coordinated 
beamforming as the proposed scheme. Each CM determines the user scheduling to maximize the sum 
successful packet transmission probability of each cluster based on the observed per-cluster CSI. 

In the simulation, we consider a cellular system with 19 BSs, each has a coverage of 500m and 
2 mobiles per cell, which distribute uniformly in the cell-edge with range [400m, 500m] from the 
BS. We apply the Urban Macrocell Model in 3GPP |[23l with path loss model given by PL = 
34.5 + 351og]^o(^)' where r (in m) is the distance from the transmitter to the receiver. Each element 
of hj^^i) is CA/'(0, 1). The total bandwidth is IMHz. The BS transmit power is P5 = 35 dBm for all 
b e B, the threshold is Sk = 0.5, and /3/c = 1 for all /c G /C. 75 is the same for all b ^ B. We consider 
Bernoulli arrival processes for the renewable energy and busty data arrivals. The maximum buffer 
size Nq = 15 pcks. 
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A. Effect of BS Coordination 

From Theorem |2l we can see that for any given Ne > 0, the loading supported by the energy 
harvesting system Amax(^£;,^0 increases as Nt increases. Intuitively, the gain comes from BS 
coordination. Fig. |2] illustrates the average delay versus the average transmit power cost for different 
number of transmit antennas A^^. It can be observed that the average delay of Baseline 2 and the 
proposed scheme decreases as Nt increases. This demonstrates that BS coordination improves the 
delay performance. 

B. Effect of Energy Buffer Size 

From Theorem |2l we can see that for any given Nt > 1, the loading supported by the energy 
harvesting system Amax(^£;,^t) increases in Ne- Specifically, when x''{Ne) < min{l, ^r^}, 
X^L(NE,Nt) increases as Ne increases. The intuition is that the above condition corresponds to 
the power-limited region. By increasing Ne, more renewable energy can be accumulated due to less 
renewable energy loss when the energy storage is full, and hence more traffic loading can be supported. 
However, when x''{Ne) > min{l, ^r^}, Amax(^£;, Nt) is constant for all Ne > 0. The intuition is 
that the above condition corresponds to the interference-limited region, in which the traffic loading 
supported cannot be increased by accumulating more renewable energy through increasing Ne- Fig. 
[3] illustrates the average delay versus the energy storage size Ne at average transmit grid power 15 
dBm. It can be observed that the average delay decreases as the energy storage size increases for all 
the schemes. 

C. Performance of the Proposed Scheme 

Fig. |4] illustrates the average delay versus per-flow loading (average arrival rate Xk). The average 
delay of all the schemes increases as the loading increases. The proposed scheme also achieves sig- 
nificant gain over the baselines across a wide range of input loading. Fig. [5] illustrates the convergence 
property of the proposed distributed online learning algorithm for estimating the per-flow potential 
function and the per-flow Q-factor. It can be observed that the proposed distributed learning algorithm 
converges quite fast. Furthermore, the average delay at the the 500-th scheduling slot is 4.1853 pcks, 
which is much smaller than the other baselines. 

VIII. Summary 

In this paper, we propose a two-timescale delay-optimal BS-DTX control and user scheduling for 
energy harvesting downlink coordinated MIMO networks. We show that the two-timescale delay- 
optimal control problem can be modeled as a POMDP and derive the optimal centralized control. To 
reduce the complexity and facilitate the distributed implementation, we obtain a distributed solution 
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with the BS-DTX control at the BSC based on the aggregation of the ESI and QSI and the user 
scheduHng at each CM based on the per-cluster ESI, QSI and CSI with guaranteed delay perfor- 
mance. We prove the almost-sure convergence of the proposed distributed two-timescale algorithm. 
Furthermore, we analyze the stability conditions for the data queues in coordinated MIMO networks 
and discuss various design insights. 

Appendix 

Appendix A: Proof of Lemma[I] 

We shall prove Lemma[T]using sample path arguments. Let {A^(cj, t)}, {A^(cj, t)} and {H(cj, t)} 
be a given sample path (i.e., uj) of energy arrivals, packet arrivals and CSI states. Let {p(t)} and 
{s(t)} be any given sequences of feasible BS-DTX control actions and user scheduling actions. 
Note that for given {p(t)} and {s(t)}, the trajectory of QSI {Q(cj,t)} is uniquely determined. 
Let {p^{(jj,t)} and {p^{uj,t)} be the sequences of the renewable power DTX control actions and 
grid power DTX control actions satisfying the structure in Lemma [T] for the given {p(t)}, i.e., 
pf{Lj,t) = pt{t)I[Ei,{Lj,t) > 0] and pf{uj,t) = pt{t)I[Ei,{uj,t) = 0]), where {E(cj,t)} is the 
trajectory of ESI associated with {p^(cj, t)}. Let {p^'(cj, t)} and {p^'(cj, t)} be any other sequences 
of feasible renewable power DTX control actions and grid power DTX control actions conditioned 
on {p(cj, t)}, i.e., pf\(jo^ t) ^pf\uj, t) = Ph{^i t), and {E'(cl;, t)} be the trajectory of ESI associated 
with {p^'(cj,t)}. 

In the following, for each 6 G 5, we shall show that for E^(cl;, 1) = E(cj, 1), we have Ylt=i Pb^i^i t) < 
ZLiPb'i^^t). Let Apf(uj,T) 4 j:JJi{Pb'(^^t)-pf{u,t)) VT > 2 and Apf(uA) = 0. 
Then, we have Apf{(jj,t + 1) = Apf{uj,t) + {pf\uj,t) -pf{uj,t)) for all t > 1. We shall prove 
Ei{uj, t) + Apfit) > El{uj, t) and Apf{u, t) > for alH > 1 by induction. (In the following proof, 
we omit uo for notation simplicity.) 

• Consider t = 1. Since ^^^(l) = ^6(1) and Apf{l) = by the initial condition, we have 
E^{1) + ApfH) > £;^(1) and Apf{l) > 0. 

• For some t > 1, assume Ei,{t) + Apf{t) > E'^{t) and Apf{t) > 0. E^it + 1) = mm{El{t) - 
p^\t) + Af{t), Ne}' We shall show the conclusions hold for t + 1 by considering the following 
three cases. (1) When Ei){t) > 0, we have pf{t) = pi){t) and pf{t) = 0. Thus, we have 
E,(t + 1) = rmn{E,{t) - p{t) + Af{t), Ne} and Apf{t + 1) = Apf{t) + pf{t) - > 0. 
In addition, since pf'{t) = pi^it) - pf{t), we have E^^it + 1) + Apf{t + 1) = miii{Eij{t) - 
p^{t) + Af{t) + Apf{t) +pt(t) - pf{t), Ne + Apf{t + 1)} > E^^{t + 1). (2) When E^{t) = 
and E'^it) > 1, which implies Apf{t) > 1, we have pf{t) = and pf{t) = p},{t). Thus, we 
have E^{t + 1) = imn{E^{t) + (t),7V^} and Apf{t + 1) = Apf{t) + pf\t) - pt{t) = 
Apf{t) - pf'it) > 1 - 1 = 0, and hence, we have E^it + 1) + Apf{t + 1) = miii{Eij{t) + 
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A§{t) + Apf{t) -pf{t),NE + Apfit + 1)} > E'j^it + 1). (3) When Eh{t) = and = 0, 
we have p^{t) = pf'{t) = and (t) = (t) = ^^(t). Thus, we have E},{t + 1) = min{0 + 
Af{t), Ne}, E'^{t + 1) = min{0 + A^{t),NE} and Apf{t + 1) = Apf (t) + > 0, and hence, 
Eb{t + 1) + ApG(t + 1) ^ Eiit + 1) + Apf (t + 1) > + 1). 

Therefore, by induction, we can show Apf{uj^t) > for all t. Since the average delay costs per 

stage are the same, we have 

t=l \ k b J t=l \ k b J 

for any given {p(t)} and {s(t)} and T. By taking expectations over all sample paths, limsup and 
optimizations over BS-DTX control and user selection policy space, we have min^^ < 
minTT-/ J^f '^^ where tt = {fi^, fi^, • • • } with fi^ satisfying the structure in Lemma [T] 

Appendix B: Proof of LemmaO Corollary [Hand Corollary[2] 

Lemma Based on Definition O we can transform the POMDP into the MDP with a tuple 
of the following four objects: state space 5 x Q, action space V x S with partitioned architecture 
{f]*(E, Q)} according to Definition El transition kernel Pr[(E^ QOI(E, Q), ft{E, Q)], per-stage cost 
function ^((E, Q), fi(E, Q)). Since the Weak Accessibility (WA) condition holds under our problem 
setup, by Proposition 4.2.3. in ifTTTl . the optimal average cost of the transformed MDP is the same 
for all initial states. In addition, by Proposition 4.1.3. and Proposition 4.1.4. in [11], we know that 
the solution (0, {1/^(E, Q)}) to the Bellman equation in (fTOl) exists. By Proposition 4.2.1. in |[TT1l . we 
can complete the proof. ■ 

Corollarym Define Q(E, Q, p) ^ min^^(E,Q) {^((E, Q), p, f^,(E, Q))+E(esqo Pr[(E^ Q0|(E, Q), p, f^,(E 
9. Thus, we have y (E, Q) = minp^-p Q(E, Q, p). Based on ([TOl) , we can obtain ([T2l) . which is in 
terms of BS-DTX control Q-factor {Q(E,Q,p)}. From Lemma E we have the optimal BS-DTX 
control action given by (fTTI) . ■ 

Corollary^ Based on we can obtain ([H) For any (E, Q) E 5 x Q, as p* = f^;(E, Q) 
can by obtained by (fTTI) . we can obtain ri*(E, Q, H) by solving the R.H.S. of ([T4l) under p* for any 

and A^ as follows: 

min {^((E, Q), (p*, f2,(E, Q))) + J] Pr[(E', Q0|(E, Q), (p*, f2,(E, Q))]f/(E', Q')| 

S2s (E,Q) V ^ ^ ^ ^ 

(E',Q') 

min I E[Pr[(E',Q')|x,(p*,f2,(x))]|(E,Q)]C/(E',Q')|, V(E,Q)e£xQ 

(E',Q') 



Wminj Pr[(E',Q')|x,(p*,f2.(x))]|(E,Q)]C/(E',Q')|, Vx £ A- 

(E',Q') 

-.."S;^.)! E (n (l-4-(-l)''^Prk(H,p*,s) >5,])C/([E-p*]+,[Q-d]+))} (29) 
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where (a) is due to the definition of g{-, •) and Pr[(E', Q')|(E, Q), Q(E, Q))], (b) is due to Definition 
El and (c) is due to Assumptions [3] and |4] as well as - pf * = - plI[Et > 0] = [Et - pl]^ . ■ 

Appendix C: Proof of Lemma [3] 

We shall prove the additive property w.r.t. the potential function. Following the proofs of Corollary 
[T] and Corollary |2l the additive property can be easily extended to the Q-factor and the post-decision 
potential function. Let 9 and (E, Q) be the average cost and the potential function under Cl. Then, 
we have the following Bellman equation in terms of (^,{y(E, Q)}): 

^ + nE,Q)=E^^[5((E,Q),p)]+ J2 [Pr[(E',QO|(E,Q),p]]T>(E',Q') (30) 

(E',Q') 

Where ^((E, Q), p) =5((E,Q),p) and Pr [(E', Q')|(E, Q), p] = E [e^» [Pr[(E', Q0|(E, Q, H), p, s] |p, H] |p 
Let 9k and Vk^E, Q) be the per-flow average cost and potential function under Cl. Then, we have the 
following per-flow fixed point equation in terms of (9k^ {V/e(^65 Qk)})'- 

ek + VkiEb,Qk)=E^^ [9k{iEb,Qk),Pb)] + Yl ^""^ [PT[iEiQ',,)\iEb,Qk),p]] VkiE^Q'^) 

(31) 

Under f2, the induced Markov chain has a single recurrent class. Therefore, the solutions to (l30l l and 
m exist, respectively. First, we have E^- [^((E,Q),p)] = Efces Efce^c, [9k[{Eb,Qk),Pb)]- 
Second, by the relationship between the joint distribution and the marginal distribution, we have 

E(E^QoPr[(E^QO|(E,Q),p] = Pr [(^^gDI(E,Q),p] = E(£;^,Q^)Pr ^ 

Therefore, substitute 9 = J^keic^k and V(E,Q) = J^beB^keJCt ^k{Eij,Qk) into ([30l) , we can see 

that the equality holds. Therefore, we complete the proof. 

Appendix D: Proof of Lemma [4] 
Using the approximation in ([T9l) and (l29l) , we have 

^j"., { E ( n (1 - - (-1)"' ^ ^^0 ( E E ^^(t^'' - p^]^' - dk]^^)) } 
= ^j^., {EE E (1 - - (-I)'"' ^'ip'' ^ Sk])Ukm -pX, [Qk - 4]+)} 

^ min \2 y2 {l-dk-{-iy''PT[pk>Sk])Uk{[Eb-pX,[Qk-dk]+), yn 

^ E (i'^-sk)Ukm-pX,Qk) 

+ 5fc(Pr[pfc > [Qk - 1]+) + (1 -Pr[pfc > Sk]))Uk{[Eb-pl]+,Qk)) 

^ min V 5fcPr[pfc > 5k]{Ukm - pX , [Qk - 1]+) - Uk{[Eb - pX ,Qk)) 
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Appendix E: Proof of Theorem [T] 

Under the assumptions [3] and |4] as well as Cl in Definition [3l Markov chain {(E(t), Q(t))} has a 
single recurrent class (and possibly some transient states). Thus, ^2 is a unchain policy. In addition, it is 
obvious that ft* ^ ft. Therefore, the conditions of Proposition 4.4.2 in [fTTl are satisfied expect for the 
assumption that f2* is a unchain policy. We shall modify the proof of Proposition 4.4.2 to incorporate 
a general f2* as follows. We adopt the same notations as Proposition 4.4.2. (fi can be treated as Cl and 
/2 can be treated as f2*). Let (A, h^) be the gain-bias pair of a general ft. Thus, by Proposition 4.1.9, 
(A, h^) satisfies A = PA and A + h^ = T^h^. However, let (A, h^) be the gain-bias pair of a unchain 
/i, which satisfies Ae + h^ = T^h^. Since Pr[(E^ QO|(E, Q), (p, s)] ^ Pr[(E^ QO|(E, Q), (p^ s^], 
there is strict performance improvement under ft* over ft. Thus, we have a stronger result than (4.97), 
i.e. S{i) > Vz. To incorporate a general /2, we have S = (Ae — A) + (I — P) A instead of (4.98). 
Since A = PA, we have J2k=o = N{Xe - A) + (I - P^) A in stead of (4.99), which implies 
P*d = Ae — A instead of (4.100). Since S{i) > Vz, we have A > A(z) Vi. In other words, we can 
show r (E, Q) < ^ for all (E, Q) G 5 x Q. 

Appendix F: Proof of Lemma [5] 

Note that the update equations in (|2TI) and (l22l) can be treated as the synchronous stochastic versions 
of the synchronous relative value iterations (RVI) [fTTl for the Markov chains {{Ei,(t)^Qk(t)^p{t))} 
({{Ei){t),Qk{t))})with the policy space containing only one policy ft ifTTTl . Under ft defined in 
Definition [3l the two Markov chains have a single recurrent class (and possibly some transient states). 
Therefore, the condition of Lemma 2 in [24] holds, according to the explanation for the conditions 
of Proposition 4.3.2 in ifTTIl . Following the proof of Lemma 2 in ll24l , which is a modified version 
of the proof for Proposition 4.3.2 in [11], we can prove Lemma [5l We omit the details here due to 
page limit. 

Appendix G: Proof of Lemma [6] 

From the conditional coverage probability (conditioned on the nearest BS being at a distance ri 
from the randomly chosen MS) for cellular networks without BS coordination obtained in lfT2]| . we 
have the conditional successful packets transmission probability of the randomly chosen MS given by 
Ps(^i5 A') < exp (— CiA^ — ^5rf ), where A' is the density of the homogeneous PPP used to model 
the locations of active BSs and the inequality is due to riix.a) = f^2 — ^du < f^2 u~^du = 
= fj{x,a). 

Next, we shall show sufficiency by proving that (l25l) guarantees stability in a parallel dominant 
network, in which dummy packets are transmitted when a data queue is empty. Sending dummy 
packets is only aimed to cause interference to the other MSs and not counted as an actual packet 
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transmission. The dominant system stochastically dominates the original system in the sense that the 
queue sizes and grid power costs in that system are necessarily not smaller (bigger) than those in the 
original system. Therefore, the stability conditions obtained for the dominant systems are sufficient 
for the stability of the original system. In the dominant system, since Pr[j)5 = 1] = p^^, we have 
A' = Ptx^' Therefore, the service rate of the randomly chosen MS is ii{ptxi^) = PtxPsi^i^ ^')' 
By Loynes' Theorem, the queue of the randomly chosen MS is stable if < iJi{ptx^^)' Thus, 
we complete the proof for (l25l) . Note that £^5 is decoupled from Q]^ and forms a discrete-time 
M /M /1/Ne system with arrival rate A^ and departure rate ptx- By queueing theory, we have Pr[£^5 > 
0] = fiPtx^NE) Ell. Thus, we can prove the average grid power cost in the dominant system is 



P^aAPtx.NE) = (1 - f{ptx.NE))ptx- 

Appendix H: Proof of Lemma[7] 

In the following proof, we shall focus on the derivation of the conditional successful packet 
transmission probability ps(ri^ X\ Nt). The remaining proof is similar to that in the proof of 
Lemma [6l Let bi denote the i-th nearest BS among all the BSs (including those are on and off) 
to the randomly chosen MS /cq, where z = 1, • • • , Nf. Thus, bi is the BS of MS /cq. By forming a 
cluster Bq = {61, • • • , ^at,} C MS ko can achieve the highest Ps(ri, A, A', Nt). We shall calculate 
Ps(ri, A, A', A^t) under the favorable cluster Bq C ^. Let Ri and Rn^ denote the distance between 
BS 61 and MS ko as well as the distance between BS ^a^^ and MS ko. First, we shall derive the 
conditional p.d.f. /i^^ji^^ (^A^Jri) and the conditional expectation E = ri]. If tb < ri, 

we have Pt[Rn^ > rN^\Ri = ri] = 1 ^ fRN^\Ri{^Nt\^i) = 0. It remains to consider r^^ > ri. Let 
;B2(0,r) denote the 2-dim ball centered in the origin with radius r. Following similar techniques in 
ll25]| , we have, for r^^ > ri, 

y = Pt[Rn, > VN, \Ri = ri] = Pr[0, 1, • • • , TVt - 2 BSs in 02(0, r^J - ^2(0, n)] 

= 2^ exp {-X7v{r^^ - ri)) 

i=o ^' 



i^,\R,(rN,\ri) = --— = 2A^r5exp (-A^(r^^ - r^)) ^ 



- exp [-XiT{rN, - ri)) ^ ^- 

( \ / 2 2\ \ ~ 2 

=2\nTN, exp {-Mr\ - rl)) ^ ^^^^'^^^y^ , > n (32) 

poo 

^E[i?^7|i?i = rl] = / T%^fR^^\RArNt\n)dTNt 
Jo 

/ \ \Nt — l roo 
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(«) (Att 



iNt-2y.j, 



2 exp {—Xttu) u^^^ "^du = (—Att) 2 



.-iW-f) 



(33) 



where (a) is due to a > 2 and the change of variables u = — rj. (a) is tight for small ri. In 
addition, E[i?^;^|i?i = rl] < r^-«. 

Next, we shall calculate ^^(ri, A, A^t)- Note that the interference to MS ko comes from the active 
BSs in — ;Bo n addition, the signal power Gi and interference power G5 (from the active 

BS 6 G — 00 n ^0 due to small scale fading are exponentially distributed with mean 1 |[26ll . Let 
^i^AT, denote the interference, which is a function of random variable Rb- Therefore, we have 

p,(ri, A, y, Nt) = Pr[5/7Vi? > = n] = Pr[--^-^ > 5] 



Nt 



No 



1 



FT[G,>-6r?{No + lR)\lRj 



Nt 



P 

Er 



E 



exp(--K(iVo + lR„^; 



■iVt 



exp ( -pSr'^lR^^ 



(34) 



Let s — —-pSrf and Rh denote the distance between BS b & ^' — Bof]^' and MS ko, we have 



exp —s 

be<^'-Bor\<s>' 



'*',{G6} 



n exp (-sPGfei?,-") 



n E{G,}[exp(-.PG,i?,-")] 

6G3>'-Bon*' 



n 



= exp ( -2X'7r I f 1 ^ ) i vdv = exp ( -2A'7r / , ^„ , — 



(^) 



exp [ —}^Tir\5 



-^vdv \ exp ( —X^TvrlS^T] i S{^^)'^,a 



<R 



'Nt 



exp ( -pSr^lR^^ 



> exp -X tttiSo^Er 



'Nt 



RNt 

n 



R 



'Nt 



> exp ( -Xnrldo^ER 



'Nt 



■N, 



exp ( -X'——nrf5rr^ER,^ [(i?jvj'-"|i?i = n] 



a -2 



(35) 



>exp(-C^,y) 

where (a) is due to plugging in 5 = —^Srf, (b) is due to the change of variables u = (^r^ 
(c) is due to the convexity of the exponential function, (d) is due to r]{x^ a) = ^^^^ du < 

1^2. u-^du = = fj{x,a), (e) is due to inequality ([H and E[i?^;^|i?i = rl] < r^-^. 

Substituting ([35]) into (1341) , we have Ps{ri, A, A^ A^t) > exp (-Cat.A' - ^5rf ). Since A' = PtxK we 
can prove (l27l) . 
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Appendix I: Proof of Theorem [2] 



= (1 - C'iVtApteA) exp (^-CNtPtx>^ - -p-^r" 



>0' Pt^<C^ 
<0, Ptx>ctx 



(36) 



OPtx Cfptx 

where the last inequahty is due to ^"^^gi^J^^^ < ^ fiPtx^^E) < 1- In addition, ^ [O^l]- 
Therefore, we can easily obtain p^^{Ne, Nt). Next, we shall prove the property of Amax(^£;, Nt) w.r.t. 
7V^. It is obvious that x*(7V^) increases with Ne- If x*(7V^) < min{l, ^r^} , ptxi^E) = x*(7V£;) 
and > 0. Thus, X^l^{NE,Nt) is increasing in Ne. If x*(iV^) > min{l, ^}, 

Ptx(^E) = "^^^{^' c^}- -^maxl^^;, ^t) is a constant for all A^^. Finally, we shall show the 
property of Amax(^£;,^0 w.r.t. Nt for any given Ne > 0. It can be easily verified that Cat, 
is decreasing in Nt. Thus, when p^^(NE,Nt) = x*(iV£;) or 1, we have that X^l^(NE,Nt) = 
p*^(iV^)exp {-CnMNe)X - ^Srf) is increasing in Nt. When pl^{NE,Nt) = we have 

that ASax(A^£;,A^t) = C^v^Aexp (-1 - ^5rf) is increasing in Nt. 
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(a) System Model 



(b) Control Architecture 



Fig. 1. System model and control architecture of the downlink coordinated MIMO systems. The dotted lines and solid 
lines on Fig [T] (b) denote the control path and data path, respectively. 




Average Transmit AC Power (dBm) 



Fig. 2. Average delay versus average transmit grid power (dBm) at A^t = 2 and Nt = 4. The average data arrival rate is 
= 0.4 pck/slot, the average renewable energy arrival rate is Af = 0.5 unit/slot, and the energy storage size Ne — 4 
units for all A; G /C and b ^ B. 
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Energy Storage Size 



Fig. 3. Average delay versus energy storage size at average transmit grid power 15 dBm. The average data arrival rate is 
= 0.4 pck/slot and the average renewable energy arrival rate is Af = 0.6 unit/slot for all A: G /C and b ^ B. 
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Fig. 4. Average delay versus average data arrival rate at average transmit grid power 25 dBm. The average renewable 
energy arrival rate is Af = 0.5 unit/slot and the energy storage size Ne = 4 for all b ^ B. 
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Fig. 5. Convergence property of the proposed distributed online learning algorithm at average transmit grid power 20 
dBm. The average data arrival rate is = 0.4 pck/slot, the average renewable energy arrival rate is Af = 0.5 unit/slot, 
and the energy storage size Ne = 4 for all /c G /C and b ^ B. The figure illustrate the instantaneous per-flow post-decision 
potential function value Ul{Eb, Qk) and the instantaneous per-flow Q-factor value Qfc(^b, Qfc, p) respectively (during the 
online iterative updates in (l22l ) and (1211 ) versus instantaneous slot index, where k = 1, b = 1, Eb = 1, Qk = 1, Eb = 1, 
Qfc = 1 and p = {pb = 1 : 6 G 23}. The boxes indicate the average delay performance of various schemes at the two 
selected slot indices. 
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